Optimising a handcrafted dialogue system design
نویسندگان
چکیده
In the Spoken Dialogue System literature, all studies consider the dialogue move as the unquestionable unit for reinforcement learning. Rather than learning at the dialogue move level, we apply the learning at the design level for three reasons : 1/ to alleviate the high-skill prerequisite for developers, 2/ to reduce the learning complexity by taking into account just the relevant subset of the context and 3/ to have interpretable learning results that carry a reusable usage feedback. Unfortunately, tackling the problem at the design level breaks the Markovian assumptions that are required in most Reinforcement Learning techniques. Consequently, we decided to use a recent non-Markovian algorithm called Compliance Based Reinforcement Learning. This paper presents the first experimentation on online optimisation in dialogue systems. It reveals a fast and significant improvement of the system performance with by average one system misunderstanding less per dialogue. Index Terms : Spoken Dialogue Systems, Reinforcement Learning, Online Learning, Hybrid System
منابع مشابه
Optimising Turn-Taking Strategies With Reinforcement Learning
In this paper, reinforcement learning (RL) is used to learn an efficient turn-taking management model in a simulated slotfilling task with the objective of minimising the dialogue duration and maximising the completion task ratio. Turn-taking decisions are handled in a separate new module, the Scheduler. Unlike most dialogue systems, a dialogue turn is split into microturns and the Scheduler ma...
متن کاملHybridisation of expertise and reinforcement learning in dialogue systems
This paper addresses the problem of introducing learning capabilities in industrial handcrafted automata-based Spoken Dialogue Systems, in order to help the developer to cope with his dialogue strategies design tasks. While classical reinforcement learning algorithms position their learning at the dialogue move level, the fundamental idea behind our approach is to learn at a finer internal deci...
متن کاملMarkov Decision Processes with Continuous Observations for Dialogue Management
This work shows how a spoken dialogue system can be represented as a Partially Observable Markov Decision Process (POMDP) with composite observations consisting of discrete elements representing dialogue acts and continuous components representing confidence scores. Using a testbed simulated dialogue management problem and recently developed optimisation techniques, we demonstrate that this con...
متن کاملJason D. Williams, Pascal Poupart, and Steve Young Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management
This work shows how a spoken dialogue system can be represented as a Partially Observable Markov Decision Process (POMDP) with composite observations consisting of discrete elements representing dialog acts and continuous components representing confidence scores. Using a testbed simulated dialogue management problem and recently developed optimisation techniques, we demonstrate that this conti...
متن کاملAgenda-Based User Simulation for Bootstrapping a POMDP Dialogue System
This paper investigates the problem of bootstrapping a statistical dialogue manager without access to training data and proposes a new probabilistic agenda-based method for simulating user behaviour. In experiments with a statistical POMDP dialogue system, the simulator was realistic enough to successfully test the prototype system and train a dialogue policy. An extensive study with human subj...
متن کامل